Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.314
Filtrar
1.
Zhongguo Zhong Yao Za Zhi ; 49(3): 836-841, 2024 Feb.
Artigo em Chinês | MEDLINE | ID: mdl-38621887

RESUMO

This study aims to construct the element relationship and extension path of clinical evidence knowledge map with Chinese patent medicine, providing basic technical support for the formation and transformation of the evidence chain of Chinese patent medicine and providing collection, induction, and summary schemes for massive and disorganized clinical data. Based on the elements of evidence-based PICOS, the conventional construction methods of knowledge graph were collected and summarized. Firstly, the data entities related to Chinese patent medicine were classified, and entity linking was performed(disambiguation). Secondly, the study associated and classified the attribute information of the data entity. Finally, the logical relationship between entities was constructed, and then the element relationship and extension path of the knowledge map conforming to the characteristics of clinical evidence of Chinese patent medicine were summarized. The construction of the clinical evidence knowledge map of Chinese patent medicine was mainly based on process design and logical structure, and the element relationship of the knowledge map was expressed according to the PICOS principle and evidence level. The extension path crossed three levels(model layer, data layer application, and new evidence application), and the study gradually explored the path from disease, core evaluation indicators, Chinese patent medicine, core prescriptions, syndrome and treatment rules, and medical case comparison(evolution law) to new drug research and development. In this study, the top-level design of the construction of the clinical evidence knowledge map of Chinese patent medicine has been clarified, but it still needs the joint efforts of interdisciplinary disciplines. With the continuous improvement of the map construction technology in line with the characteristics of TCM, the study can provide necessary basic technical support and reference for the development of the TCM discipline.


Assuntos
Medicamentos de Ervas Chinesas , Medicamentos de Ervas Chinesas/uso terapêutico , Medicina Tradicional Chinesa , Medicamentos sem Prescrição/uso terapêutico , Tecnologia , Mineração de Dados/métodos
2.
Sci Rep ; 14(1): 7635, 2024 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-38561391

RESUMO

Extracting knowledge from hybrid data, comprising both categorical and numerical data, poses significant challenges due to the inherent difficulty in preserving information and practical meanings during the conversion process. To address this challenge, hybrid data processing methods, combining complementary rough sets, have emerged as a promising approach for handling uncertainty. However, selecting an appropriate model and effectively utilizing it in data mining requires a thorough qualitative and quantitative comparison of existing hybrid data processing models. This research aims to contribute to the analysis of hybrid data processing models based on neighborhood rough sets by investigating the inherent relationships among these models. We propose a generic neighborhood rough set-based hybrid model specifically designed for processing hybrid data, thereby enhancing the efficacy of the data mining process without resorting to discretization and avoiding information loss or practical meaning degradation in datasets. The proposed scheme dynamically adapts the threshold value for the neighborhood approximation space according to the characteristics of the given datasets, ensuring optimal performance without sacrificing accuracy. To evaluate the effectiveness of the proposed scheme, we develop a testbed tailored for Parkinson's patients, a domain where hybrid data processing is particularly relevant. The experimental results demonstrate that the proposed scheme consistently outperforms existing schemes in adaptively handling both numerical and categorical data, achieving an impressive accuracy of 95% on the Parkinson's dataset. Overall, this research contributes to advancing hybrid data processing techniques by providing a robust and adaptive solution that addresses the challenges associated with handling hybrid data, particularly in the context of Parkinson's disease analysis.


Assuntos
Algoritmos , Doença de Parkinson , Humanos , Mineração de Dados/métodos , Incerteza
3.
Food Chem Toxicol ; 187: 114638, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38582341

RESUMO

With a society increasingly demanding alternative protein food sources, new strategies for evaluating protein safety issues, such as allergenic potential, are needed. Large-scale and systemic studies on allergenic proteins are hindered by the limited and non-harmonized clinical information available for these substances in dedicated databases. A missing key information is that representing the symptomatology of the allergens, especially given in terms of standard vocabularies, that would allow connecting with other biomedical resources to carry out different studies related to human health. In this work, we have generated the first resource with a comprehensive annotation of allergens' symptomatology, using a text-mining approach that extracts significant co-mentions between these entities from the scientific literature (PubMed, ∼36 million abstracts). The method identifies statistically significant co-mentions between the textual descriptions of the two types of entities in the literature as indication of relationship. 1,180 clinical signs extracted from the Human Phenotype Ontology, the Medical Subject Heading terms of PubMed together with other allergen-specific symptoms, were linked to 1,036 unique allergens annotated in two main allergen-related public databases via 14,009 relationships. This novel resource, publicly available through an interactive web interface, could serve as a starting point for future manually curated compilation of allergen symptomatology.


Assuntos
Alérgenos , Mineração de Dados , Humanos , Mineração de Dados/métodos , Bases de Dados Factuais , Proteínas/metabolismo
4.
BMC Med Inform Decis Mak ; 24(Suppl 3): 98, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38632621

RESUMO

BACKGROUND: Tremendous research efforts have been made in the Alzheimer's disease (AD) field to understand the disease etiology, progression and discover treatments for AD. Many mechanistic hypotheses, therapeutic targets and treatment strategies have been proposed in the last few decades. Reviewing previous work and staying current on this ever-growing body of AD publications is an essential yet difficult task for AD researchers. METHODS: In this study, we designed and implemented a natural language processing (NLP) pipeline to extract gene-specific neurodegenerative disease (ND) -focused information from the PubMed database. The collected publication information was filtered and cleaned to construct AD-related gene-specific publication profiles. Six categories of AD-related information are extracted from the processed publication data: publication trend by year, dementia type occurrence, brain region occurrence, mouse model information, keywords occurrence, and co-occurring genes. A user-friendly web portal is then developed using Django framework to provide gene query functions and data visualizations for the generalized and summarized publication information. RESULTS: By implementing the NLP pipeline, we extracted gene-specific ND-related publication information from the abstracts of the publications in the PubMed database. The results are summarized and visualized through an interactive web query portal. Multiple visualization windows display the ND publication trends, mouse models used, dementia types, involved brain regions, keywords to major AD-related biological processes, and co-occurring genes. Direct links to PubMed sites are provided for all recorded publications on the query result page of the web portal. CONCLUSION: The resulting portal is a valuable tool and data source for quick querying and displaying AD publications tailored to users' interested research areas and gene targets, which is especially convenient for users without informatic mining skills. Our study will not only keep AD field researchers updated with the progress of AD research, assist them in conducting preliminary examinations efficiently, but also offers additional support for hypothesis generation and validation which will contribute significantly to the communication, dissemination, and progress of AD research.


Assuntos
Doença de Alzheimer , Doenças Neurodegenerativas , Animais , Camundongos , Mineração de Dados/métodos , PubMed , Bases de Dados Factuais
5.
Yonsei Med J ; 65(4): 210-216, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38515358

RESUMO

PURPOSE: The purpose of this study was to use data mining methods to establish a simple and reliable predictive model based on the risk factors related to gallbladder stones (GS) to assist in their diagnosis and reduce medical costs. MATERIALS AND METHODS: This was a retrospective cross-sectional study. A total of 4215 participants underwent annual health examinations between January 2019 and December 2019 at the Physical Examination Center of Shengjing Hospital Affiliated to China Medical University. After rigorous data screening, the records of 2105 medical examiners were included for the construction of J48, multilayer perceptron (MLP), Bayes Net, and Naïve Bayes algorithms. A ten-fold cross-validation method was used to verify the recognition model and determine the best classification algorithm for GS. RESULTS: The performance of these models was evaluated using metrics of accuracy, precision, recall, F-measure, and area under the receiver operating characteristic curve. Comparison of the F-measure for each algorithm revealed that the F-measure values for MLP and J48 (0.867 and 0.858, respectively) were not statistically significantly different (p>0.05), although they were significantly higher than the F-measure values for Bayes Net and Naïve Bayes (0.824 and 0.831, respectively; p<0.05). CONCLUSION: The results of this study showed that MLP and J48 algorithms are effective at screening individuals for the risk of GS. The key attributes of data mining can further promote the prevention of GS through targeted community intervention, improve the outcome of GS, and reduce the burden on the medical system.


Assuntos
Algoritmos , Vesícula Biliar , Adulto , Humanos , Estudos Retrospectivos , Estudos Transversais , Teorema de Bayes , Mineração de Dados/métodos
6.
J Med Internet Res ; 26: e54580, 2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38551633

RESUMO

BACKGROUND: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. OBJECTIVE: This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. METHODS: The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People's Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert's annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. RESULTS: The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. CONCLUSIONS: The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.


Assuntos
Mineração de Dados , Registros Eletrônicos de Saúde , Humanos , Mineração de Dados/métodos , Processamento de Linguagem Natural , China , Idioma
7.
PLoS One ; 19(3): e0299582, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38517917

RESUMO

This paper introduces a model for the translation of natural language into ontology and vice versa in an autonomous navigation system of a sea-going vessel. The system comprehensively executes communication tasks at sea. The authors use machine learning methods in the field of text mining and basic and additional properties of ontologies. The newly developed ontology is applicable in shipping. The key elements of the prototype are the sequence of communication commands given from the ship's bridge, decomposition, extraction of the communication sequence and the rule base. The presented model has been implemented and verified in selected scenarios of collision situations at sea. The test results confirm that the assumptions, the designed system architecture and the algorithms in the prototype are correct.


Assuntos
Algoritmos , Mineração de Dados , Mineração de Dados/métodos , Aprendizado de Máquina , Idioma , Comunicação
8.
Sci Data ; 11(1): 265, 2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38431735

RESUMO

It is vital to investigate the complex mechanisms underlying tumors to better understand cancer and develop effective treatments. Metabolic abnormalities and clinical phenotypes can serve as essential biomarkers for diagnosing this challenging disease. Additionally, genetic alterations provide profound insights into the fundamental aspects of cancer. This study introduces Cancer-Alterome, a literature-mined dataset that focuses on the regulatory events of an organism's biological processes or clinical phenotypes caused by genetic alterations. By proposing and leveraging a text-mining pipeline, we identify 16,681 thousand of regulatory events records encompassing 21K genes, 157K genetic alterations and 154K downstream bio-concepts, extracted from 4,354K pan-cancer literature. The resulting dataset empowers a multifaceted investigation of cancer pathology, enabling the meticulous tracking of relevant literature support. Its potential applications extend to evidence-based medicine and precision medicine, yielding valuable insights for further advancements in cancer research.


Assuntos
Neoplasias , Medicina de Precisão , Humanos , Mineração de Dados/métodos , Neoplasias/genética , Fenótipo , Medicina de Precisão/métodos
9.
PLoS One ; 19(2): e0296855, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38359072

RESUMO

This study aims to enhance governmental decision-making by leveraging advanced topic modeling algorithms to analyze public letters on the "People Call Me" online government inquiry platform in Zhejiang Province, China. Employing advanced web scraping techniques, we collected publicly available letter data from Hangzhou City between June 2022 and May 2023. Initial descriptive statistical analyses and text mining were conducted, followed by topic modeling using the BERTopic algorithm. Our findings indicate that public demands are chiefly focused on livelihood security and rights protection, and these demands exhibit a diversity of characteristics. Furthermore, the public's response to significant emergency events demonstrates both sensitivity and deep concern, underlining its pivotal role in government emergency management. This research not only provides a comprehensive landscape of public demands but also validates the efficacy of the BERTopic algorithm for extracting such demands, thereby offering valuable insights to bolster the government's agility and resilience in emergency responses, enhance public services, and modernize social governance.


Assuntos
Mineração de Dados , Governo , Humanos , China , Mineração de Dados/métodos , Emprego
10.
BMC Med Res Methodol ; 24(1): 40, 2024 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-38365591

RESUMO

PURPOSE: Data mining has been used to help discover Frequent patterns in health data. it is widely used to diagnose and prevent various diseases and to obtain the causes and factors affecting diseases. Therefore, the aim of the present study is to discover frequent patterns in the data of the Kashan Trauma Registry based on a new method. METHODS: We utilized real data from the Kashan Trauma Registry. After pre-processing, frequent patterns and rules were extracted based on the classical Apriori algorithm and the new method. The new method based on the weight of variables and the harmonic mean was presented for the automatic calculation of minimum support with the Python. RESULTS: The results showed that the minimum support generation based on the weighting features is done dynamically and level by level, while in the classic Apriori algorithm considering that only one value is considered for the minimum support manually by the user. Also, the performance of the new method was better compared to the classical Apriori method based on the amount of memory consumption, execution time, the number of frequent patterns found and the generated rules. CONCLUSIONS: This study found that manually determining the minimal support increases execution time and memory usage, which is not cost-effective, especially when the user does not know the dataset's content. In trauma registries and massive healthcare datasets, its ability to uncover common item groups and association rules provides valuable insights. Also, based on the patterns produced in the trauma data, the care of the elderly by their families, education to the general public about encountering patients who have an accident and how to transport them to the hospital, education to motorcyclists to observe safety points in Recommended when using a motorcycle.


Assuntos
Algoritmos , Mineração de Dados , Humanos , Idoso , Mineração de Dados/métodos
11.
J Am Med Inform Assoc ; 31(4): 991-996, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38218723

RESUMO

OBJECTIVE: The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. In this paper, we present the annotated corpora, a technical summary of participants' systems, and the performance results. METHODS: The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of 5 tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). RESULTS: In total, 29 teams registered, representing 17 countries. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. CONCLUSION: To facilitate future work, the datasets-a total of 61 353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.


Assuntos
Mídias Sociais , Humanos , Mineração de Dados/métodos , Redes Neurais de Computação , Processamento de Linguagem Natural , Aprendizado de Máquina
12.
IEEE J Biomed Health Inform ; 28(4): 2314-2325, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38265897

RESUMO

In the biomedical literature, entities are often distributed within multiple sentences and exhibit complex interactions. As the volume of literature has increased dramatically, it has become impractical to manually extract and maintain biomedical knowledge, which would entail enormous costs. Fortunately, document-level relation extraction can capture associations between entities from complex text, helping researchers efficiently mine structured knowledge from the vast medical literature. However, how to effectively synthesize rich global information from context and accurately capture local dependencies between entities is still a great challenge. In this paper, we propose a Local to Global Graphical Reasoning framework (LoGo-GR) based on a novel Biased Graph Attention mechanism (B-GAT). It learns global context feature and information of local relation path dependencies from mention-level interaction graph and entity-level path graph respectively, and collaborates with global and local reasoning to capture complex interactions between entities from document-level text. In particular, B-GAT integrates structural dependencies into the standard graph attention mechanism (GAT) as attention biases to adaptively guide information aggregation in graphical reasoning. We evaluate our method on three publicly biomedical document-level datasets: Drug-Mutation Interaction (DV), Chemical-induced Disease (CDR), and Gene-Disease Association (GDA). LoGo-GR has advanced and stable performance compared to other state-of-the-art methods (it achieves state-of-the-art performance with 96.14%-97.39% F1 on DV dataset, advanced performance with 68.89% F1 and 84.22% F1 on CDR and GDA datasets, respectively). In addition, LoGo-GR also shows advanced performance on general-domain document-level relation extraction dataset, DocRED, which proves that it is an effective and robust document-level relation extraction framework.


Assuntos
Mineração de Dados , Humanos , Mineração de Dados/métodos
13.
J Biomed Inform ; 150: 104599, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38272433

RESUMO

OBJECTIVE: Event extraction plays a crucial role in natural language processing. However, in the biomedical domain, the presence of nested events adds complexity to event extraction compared to single events, and these events usually have strong semantic relationships and constraints. Previous approaches ignored the binding connections between these complex nested events. This study aims to develop a unified framework based on event constraint information that jointly extract biomedical event triggers and arguments and enhance the performance of nested biomedical event extraction. MATERIAL AND METHODS: We propose a multi-task learning framework based on constraint information called CMBEE for the task of biomedical event extraction. The N-tuple form of event patterns is used to represent the constrained information, which is integrated into role detection and event type classification tasks. The framework use attention mechanism and gating mechanism to explore the fusion of multiple tuple information, as well as local and global constrained information fusion methods to dig further into the connections between events. RESULTS: Experimental results demonstrate that our proposed method achieves the highest F1 score on a multilevel event extraction biomedical (MLEE) corpus and performs favorably on the biomedical natural language processing shared task 2013 Genia event corpus (GE 13). CONCLUSIONS: The experimental results indicate that modeling event patterns and constraints for multi-event extraction tasks is effective for complex biomedical event extraction. The fusion strategy proposed in this study, which incorporates different constraint information, helps to better express semantic information.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Semântica , Mineração de Dados/métodos
14.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38258418

RESUMO

MOTIVATION: Scientific advances build on the findings of existing research. The 2001 publication of the human genome has led to the production of huge volumes of literature exploring the context-specific functions and interactions of genes. Technology is needed to perform large-scale text mining of research papers to extract the reported actions of genes in specific experimental contexts and cell states, such as cancer, thereby facilitating the design of new therapeutic strategies. RESULTS: We present a new corpus and Text Mining methodology that can accurately identify and extract the most important details of cancer genomics experiments from biomedical texts. We build a Named Entity Recognition model that accurately extracts relevant experiment details from PubMed abstract text, and a second model that identifies the relationships between them. This system outperforms earlier models and enables the analysis of gene function in diverse and dynamically evolving experimental contexts. AVAILABILITY AND IMPLEMENTATION: Code and data are available here: https://github.com/cambridgeltl/functional-genomics-ie.


Assuntos
Genômica , Neoplasias , Humanos , Neoplasias/genética , Mineração de Dados/métodos , PubMed , Fenótipo
15.
Eval Rev ; 48(2): 370-398, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37195259

RESUMO

The impact of pro-environmental behavior on policymaking has been an exciting area of research. While the relationship between pro-environmental behavior and policymaking has been explored in numerous studies, there needs to be more synthesis on this topic. This is the first text-mining study of pro-environmental effects in which policymaking is a significant factor. In response, this study, for the first time, takes a novel approach by using text mining in R programming to analyze 30 publications from the Scopus database on pro-environmental behavior in policymaking, highlighting major research themes and prospective research areas for future investigation. Results from text mining yielded 10 topic models, which are presented with a synopsis of the published research and a list of the primary authors, as well as a posterior probability via latent Dirichlet allocation (LDA). Additionally, the study conducts a trend analysis of the top 10 journals with the highest impact factor, considering the influence of each journal's mean citation. The study offers an overview of the impacts of pro-environmental behavior in policymaking, showing the most relevant and frequently discussed themes, introduces the scientific visualization of papers published in the Scopus database, and proposes future study directions. These findings can help researchers and environmental specialists better understand how pro-environmental behavior can be fostered more effectively through policymaking.


Assuntos
Bibliometria , Publicações , Estudos Prospectivos , Mineração de Dados/métodos , Bases de Dados Factuais
16.
Accid Anal Prev ; 195: 107421, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38061291

RESUMO

Accurately and quickly mining the hidden information in railway dangerous goods transportation (RDGT) accident reports has great significance for its safety management. In this paper, a data mining method Logistics-DT-TFP is proposed for analysing the causes of RDGT accidents. Firstly, analyse the transportation process, extract the cause of the accident, and classify the severity of the accident. Then, using ordered multi-classification Logistic regression for correlation calculation, qualitatively judge and quantitatively analyse the relationship between each cause and the severity of the accident. The feature tags of the Decision Tree (DT) are screened, the C5.0 algorithm is used to obtain the accident coupling rules. Next, the FP-Growth algorithm is used to mine frequent itemsets, and TOP-K is used to improve it and output effective association rules with the degree of lift as the indicator, which avoids repeated traversal of the database, shortens the time complexity, and reduces the impact of the minimum support setting on the calculation results. The degree of lift among the causes in the coupling chain is calculated as a complement to the extraction of coupling rules. Finally, based on the analysis and mining results of case study, the management strategies for railway dangerous goods are proposed.


Assuntos
Acidentes de Trânsito , Meios de Transporte , Humanos , Causalidade , Mineração de Dados/métodos , Algoritmos
17.
Mol Cell Proteomics ; 23(1): 100682, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37993103

RESUMO

Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning-based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Bases de Dados Factuais , PubMed
18.
IEEE Trans Vis Comput Graph ; 30(1): 1227-1237, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38015695

RESUMO

Personalized head and neck cancer therapeutics have greatly improved survival rates for patients, but are often leading to understudied long-lasting symptoms which affect quality of life. Sequential rule mining (SRM) is a promising unsupervised machine learning method for predicting longitudinal patterns in temporal data which, however, can output many repetitive patterns that are difficult to interpret without the assistance of visual analytics. We present a data-driven, human-machine analysis visual system developed in collaboration with SRM model builders in cancer symptom research, which facilitates mechanistic knowledge discovery in large scale, multivariate cohort symptom data. Our system supports multivariate predictive modeling of post-treatment symptoms based on during-treatment symptoms. It supports this goal through an SRM, clustering, and aggregation back end, and a custom front end to help develop and tune the predictive models. The system also explains the resulting predictions in the context of therapeutic decisions typical in personalized care delivery. We evaluate the resulting models and system with an interdisciplinary group of modelers and head and neck oncology researchers. The results demonstrate that our system effectively supports clinical and symptom research.


Assuntos
Rosa , Humanos , Qualidade de Vida , Gráficos por Computador , Mineração de Dados/métodos
19.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38147362

RESUMO

MOTIVATION: Up-to-date pathway knowledge is usually presented in scientific publications for human reading, making it difficult to utilize these resources for semantic integration and computational analysis of biological pathways. We here present an approach to mining knowledge graphs by combining manual curation with automated named entity recognition and automated relation extraction. This approach allows us to study pathway-related questions in detail, which we here show using the ketamine pathway, aiming to help improve understanding of the role of gut microbiota in the antidepressant effects of ketamine. RESULTS: The thus devised ketamine pathway 'KetPath' knowledge graph comprises five parts: (i) manually curated pathway facts from images; (ii) recognized named entities in biomedical texts; (iii) identified relations between named entities; (iv) our previously constructed microbiota and pre-/probiotics knowledge bases; and (v) multiple community-accepted public databases. We first assessed the performance of automated extraction of relations between named entities using the specially designed state-of-the-art tool BioKetBERT. The query results show that we can retrieve drug actions, pathway relations, co-occurring entities, and their relations. These results uncover several biological findings, such as various gut microbes leading to increased expression of BDNF, which may contribute to the sustained antidepressant effects of ketamine. We envision that the methods and findings from this research will aid researchers who wish to integrate and query data and knowledge from multiple biomedical databases and literature simultaneously. AVAILABILITY AND IMPLEMENTATION: Data and query protocols are available in the KetPath repository at https://dx.doi.org/10.5281/zenodo.8398941 and https://github.com/tingcosmos/KetPath.


Assuntos
Microbioma Gastrointestinal , Ketamina , Humanos , Ketamina/farmacologia , Bases de Dados Factuais , Antidepressivos/farmacologia , Neurotransmissores , Mineração de Dados/métodos
20.
Sensors (Basel) ; 23(23)2023 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-38067736

RESUMO

The rapid growth of electronic health records (EHRs) has led to unprecedented biomedical data. Clinician access to the latest patient information can improve the quality of healthcare. However, clinicians have difficulty finding information quickly and easily due to the sheer data mining volume. Biomedical information retrieval (BIR) systems can help clinicians find the information required by automatically searching EHRs and returning relevant results. However, traditional BIR systems cannot understand the complex relationships between EHR entities. Transformers are a new type of neural network that is very effective for natural language processing (NLP) tasks. As a result, transformers are well suited for tasks such as machine translation and text summarization. In this paper, we propose a new BIR system for EHRs that uses transformers for predicting cancer treatment from EHR. Our system can understand the complex relationships between the different entities in an EHR, which allows it to return more relevant results to clinicians. We evaluated our system on a dataset of EHRs and found that it outperformed state-of-the-art BIR systems on various tasks, including medical question answering and information extraction. Our results show that Transformers are a promising approach for BIR in EHRs, reaching an accuracy and an F1-score of 86.46%, and 0.8157, respectively. We believe that our system can help clinicians find the information they need more quickly and easily, leading to improved patient care.


Assuntos
Registros Eletrônicos de Saúde , Neoplasias , Humanos , Mineração de Dados/métodos , Processamento de Linguagem Natural , Redes Neurais de Computação , Sistemas de Informação , Neoplasias/terapia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...